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Sensitivity of searches for new signals and its optimization 

Giovanni Punzi 
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A frequentist definition of sensitivity of a search for new phenomena is discussed, that has several 
useful properties. It is based on completely standard concepts, is generally applicable, and has a 
very clear interpretation. It is particularly suitable for optimization, being independent of a-priori 
expectations about the presence of a signal, thus allowing the determination of a single set of cuts 
that is optimal both for setting limits and for making a discovery. Simple approximate formulas are 
given for the common problem of Poisson counts with background. 



I. INTRODUCTION 

The question of the sensitivity of a search for new 
phenomena is a very common one. The need may arise 
either by the wish to predict the outcome of an experi- 
ment and compare several possible experiments or dif- 
ferent configurations of the same experiment. Several 
different ways have been used to quantify the sensi- 
tivity of a search, which makes it sometimes difficult 
to compare them. In particular, two different sen- 
sitivity figures are often quoted, one that is relative 
to the potential for actually making a discovery, and 
another to characterize how strong a constraint is im- 
posed on the unknown phenomena if no evidence is 
found for a deviation from the standard theory. This 
situation makes it difficult to optimize the design of 
an experiment, because it is not clear what should be 
maximized. I describe here a definition of sensitivity 
which is unique and well-defined for any experiment. 
This is based on purely frequentist ideas, which avoids 
the issue of the choice of an a-priori distribution for a 
new and unknown phenomena. 



II. STATEMENT OF THE PROBLEM 

The problem of searches for new phenomena can be 
stated formally in classical statistics as one of "Hy- 
pothesis testing". We have a "default hypothesis" 
Hq, that is our current best theory, and as a result 
of the experiment we wish to either confirm or dis- 
prove the theory Hq, in favor of an alternative theory 
Hm, where m indicates the free parameters of the new 
theory (mass or set of masses of new particles, cou- 
pling constants, production cross sections, etc.). The 
experiment consists of measuring the value of a set of 
observables X (possibly a large number) whose distri- 
bution depends on the true state of nature being Hq or 
H„i. In a simple counting experiment, the observable 
X is the number of observed counts, and hypothesis 
Hq is defined as the distribution of X being a Poisson 
with the mean equal to the number of expected back- 
ground events B. Hypothesis Hm is that the distribu- 
tion is instead a Poisson with a larger mean B -f 5™, 
where Sm is the expected contribution of the "new 
signal" , which is a function of the unknown free pa- 



rameters of the new theory, m. A test of Hq is speci- 
fied by defining the set of values of X that will make 
us decide that Hq must be rejected ("critical region"); 
the significance level of the test, indicated by a, is the 
probability of rejecting Hq when it is indeed true; that 
is to say, a is the probability for X to fall within the 
critical region, calculated under the assumption that 
Hq is true. There are many possible choices of the 
critical region, therefore many possible different tests 
at the given significance level a, and we will not be 
concerned here with the way the choice is made; all of 
the present discussion is independent of the way the 
test was chosen. 

What about the value of a ? This is a "small num- 
ber" , common practice for really new physics discov- 
ery being to require a to correspond to the 5a single 
tail of a gaussian distribution. 

The other element to be considered in a test is the 
probability that a discovery is made. The classical 
way to express this is by the power function 1 — I3{m), 
that is, the probability that X will fall in the criti- 
cal region (=the probability that a discovery will be 
claimed) assuming Hm is true, as a function of the pa- 
rameters m. It is clearly desirable to have the greatest 
possible power. However, it is well known that only 
in very few special problems it is possible to maxi- 
mize the power simultaneously for every m. For this 
reason, trying to optimize the power is subject to a 
judgement about what values of the parameters are 
more important; in the next section we will show how 
to solve the issue by attacking the problem from a 
different angle. 

After a measurement is performed, if no discovery 
is made the experimenter will usually produce an ad- 
ditional piece of information: a confidence region for 
the unknown parameters m. This part is in princi- 
ple completely independent from the "testing" part, 
and interesting issues arise when one tries to make 
sure the two kinds of information are coherent. For 
instance, limits are often desired at a confidence level 
lower than the level of significance required for claim- 
ing a discovery; this can lead easily to situations where 
no discovery is claimed, and yet limits are quoted that 
do not include the Hq hypothesis. For the purpose of 
the present discussion we don't need to deal with such 
difficult issues and we will make only minimal assump- 
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tions about the relationship between the test and the 
algorithm adopted for setting limits. We will just as- 
sume that the confidence band for m be built in such 
a way to exclude, whenever possible, all values of X 
falling within the acceptance region for Hq] (this can 
be done for every m such that 1 — /3(to) > CL, where 
CL is the desired Confidence Level). This is quite 
natural, and usually happens spontaneously, because 
it makes for tighter confidence regions when no dis- 
covery is made, at no expense. 

If a discovery is indeed made, the most interesting 
piece of information in the result will be the discovery 
itself, and maybe an estimate of the parameters m, so 
we will not be concerned with limit setting in case of 
discovery, only with the probability that it happens. 

III. DEFINITION OF SENSITIVITY OF A 
SEARCH EXPERIMENT 

Many definitions of sensitivity for a search have to 
do with either the "average limit" produced if Hq is 
true (defined in various ways), or with the significance 
of an observed signal, assuming the observation is ex- 
actly equal to the expected value in presence of a sig- 
nal at m. 

We suggest to characterize the sensitivity of an ex- 
periment in the following way. Correct statistical 
practice requires to decide before the experiment the 
values of a and CL, so we assume their values are 
given. Then one can proceed by quoting the region of 
the parameters m for which the power of the chosen 
test is greater or equal to the Confidence Level chosen 
for the limits in case there is no discovery: 

1-/3„(to)>CL (1) 

This region of m can be thought of as a region of 
parameters to which the experiment is "sufficiently 
sensitive". While it is always possible to provide ad- 
ditional information by plotting contours of constant 
power in the m space for values different from the CL, 
the specific region defined by eq. is particularly in- 
formative because it has a very simple and clear-cut 
interpretation. In fact, it is easy to verify that the 
following two statements hold simultaneously: 

• If the true value of m satisfies , then there is a 
probability at least CL that performing the ex- 
periment will lead to discovery (with the chosen 
significance a). 

• If performing the experiment does not lead to 
discovery, the resulting limits will exclude (at 
least) the entire region defined by QJ, at the 
chosen CL. (N.B. this relies on the minimal as- 
sumption of a "reasonable algorithm" for setting 
limits made in previous section, and holds inde- 
pendently of the true value of m.) 



In short, eq. |^ defines the region in the parame- 
ter space for which the experiment will certainly give 
an answer: that region will be excluded, or a dis- 
covery will be claimed, with no possible in-between. 
This double discovery/exclusion interpretation sug- 
gests that it deserves to be named sensitivity region 
for the experiment and to be quoted as the single most 
useful information to characterize its potential and op- 
timize it. Note explicitly that there is no possibility 
for an experimental fluctuation to jeopardize the re- 
sult; it is possible for a fluctuation to increase the 
region of exclusion, but not to diminish it. In partic- 
ular, if the parameter region covers the whole range 
of physically interesting values for m, the experiment 
can very well been said to be conclusive. This sensi- 
tivity region appears to be a more useful information 
than others commonly quoted, that have a more vague 
meaning, like: 

• the "average" excluded region, ifHo is true (tells 
you nothing certain about the actual limits that 
will be quoted; tells you nothing about what will 
happen if the signal exists but it is small) 

• an " average number of sigmas" , for given values 
of m, or the number of sigmas you would get 
in case exactly the expected number of signal 
events is observed (tells you nothing about the 
limits in case there is no observation; tells you 
little about how likely it is that a signal will ac- 
tually be observed, due to the effect of statistical 
fluctuations) 

Comparison between two experiments, or exper- 
imental settings, should be made on the basis of 
whether one sensitivity region includes the other. 
It is still possible for two experiments to be non- 
comparable, by having none of the two region com- 
pletely include the other; in that case, the issue of 
which is preferable cannot be resolved on a statistical 
basis, but it is a question of strategy. If the sensitivity 
regions are very different, the actual conclusion is that 
the two experiments are somehow 'complementary', 
probing different regions of the parameters space. 

There are a few other arguments in favor of quot- 
ing this quantity to characterize the sensitivity of an 
experiment: 

• The definition is independent of the choice 
of metric (in both observable and parameter 
space). 

• It does not require a choice of priors 

• It is straightforward (and meaningful) to apply 
even in complex situations. For instance: 

— 1-D problems with a "non monotonic" 
structure. Example: search for a CP vi- 
olation effect, where one measures the sine 
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of an angle, with the range [—1, 1]. In this 
case Hq is in the middle, and it makes no 
sense to quote "average upper limit" . 

— multidimensional parameter problems. Ex- 
amples of this kind are neutrino oscil- 
lation searches, where the space is 2-D. 
Even more complex examples are found in 
CP-violation measurements in neutral B 
mesons oscillations, where both a direct 
and a mixed component are possible; in 
this case the allowed region for the parame- 
ters is circle of unit radius, Hq being at the 
center, and it is impossible to use concepts 
like "average upper limit" , or even "median 
of the limit" . 

• It is independent of the expectations for a signal 
to be present, thus allowing an unbiased opti- 
mization. 

• It allows you to optimize what you really want 
for a search, without being distracted by other 
elements. For instance, if one had to concentrate 
on getting the maximum possible power (e.g. by 
looking at its average it over a chosen region), 
one can easily be fooled into preferring an ex- 
periment that has a very high power in a region 
where the power is pretty high anywyay, over 
one that has a more even distribution of power, 
that is actually much more likely to provide use- 
ful information, since in a discovery measure- 
ment the power counts the most where it is "in- 
termediate" . Considering the region rather than 
power in itself takes this into account. 

IV. OPTIMIZATION OF A COUNTING 
EXPERIMENT 

We will now apply the ideas discussed in the previ- 
ous section to the very common problem of a counting 
experiment in presence of background. In this case, we 
have the discrete observable n, the number of events 
observed, which is Poisson-distributcd with a mean 
determined by B, the expected number of background 
events (supposed known), and the possible contribu- 
tion of signal events Sm'- 

p{n\Ho) = e-^B'^/n\ (2) 

p{n\H,n)= e-B-s^^{B + S„,Y/n\ (3) 

For this problem, the only sensible definition of a criti- 
cal region for the presence of non-zero signal Sm takes 
the form of a condition like 

Therefore, the test is completely defined once the 
desired significance level a is chosen. Figure ^ shows 



the value of n„iin as a function of B, for given values 
of a, obtained by numerical calculation of sums of 
Poisson probabilities. 



nmin 
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FIG. 1: Minimum number of observed events needed to 
claim discovery with 95%, 3cr, 5(7 significance, vs expected 
background. 

Having completely defined the test, we can now 
evaluate its power as a function of m, and determine 
the set of values for m such that eq. |^ holds. Since 
the power of a test of the form n > rimin grows mono- 
tonically with Sm, it is easy to see that eq. leads 
to simple inequalities of the form: 

Sm ^ Smin 

Therefore, all is needed to completely characterize 
the solution of our problem is the value of Smin, that 
is in general a function of a,/3, and B. Plots of Smin 
from numerical calculation are shown in Fig. |2| 



Smin 
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FIG. 2: The lower limit of the sensitivity region Smin, for 
a search experiment with (significance, CL) respectively of 
(95%,95%), (3cr,95%), (5cr,90%). 

Tables of this kind of data can in principle be used 
to compare different experimental settings, by deter- 
mining for each of them the set of values of m such 
that Sm > Smin, and choosing the one with the 
largest set. However, it is much easier to perform 
such optimizations tasks with the help of an analytic 
parametrization. For the purpose of optimization, an 
approximation of the exact result is usually sufhcient; 
in particular, there is no need to account for the dis- 
cretization effects. 

A simple parametrization of our result can be ob- 
tained by means of Gaussian approximation of the 
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Poisson. It is easy to see that in this approximation, 

condition ^ translates into the following equation for 
Q . . 



Smm = aVB + b^y B + S„ 



(4) 



where a and b are the number of sigmas correspond- 
ing to one-sided Gaussian tests at significance a and 
/3 respectively. 

Solving eq. for Smin yields the solution: 



7 2 7 / 

5™„ = y + + - V&2+4aVB + 4B (5) 

This expression holds for one specific set of data se- 
lection criteria. Now consider the common situation 
where one has to decide on the set of cuts to be used 
in the analysis. This means that both the background 
B and the number of expected signal events Sm will 
depend on the cuts (let's indicate the whole set of cuts 
with the symbol t). In a completely general case, in 
order to decide which set of cuts t is best, one needs 
to determine for every t the set of values m to which 
the experiment is sensitive, by solving for m the in- 
equality: 

Srr^{t) > j + a y/B^ + ^ b^ + A a ^B{^ + A B (t) 

and then choose the cuts t yielding the most ex- 
tended region. The situation is much simpler when 
the efficiency e of the chosen cuts on the signal is in- 
dipendent of m, that is when one can write: 

S„i{t) = e{t) ■ L - am 

where L is the integrated luminosity and am is the 
cross section of the process being searched for. 

In this case one can simply invert the above equa- 
tion to write down the minimum "detectable" (accord- 
ing to our criteria) cross section: 



f +a/B(i) + 1^62 -h4a/B(i)+4B(t) 
e{t) ■ L 

Obviously, the maximum sensitivity is attained 
when amin is smallest, that is when the quantity: 



e{t) 



+ 2ay^B{¥j + b^b^ +Aay/B{rj + AB(t) 



(6) 



reaches its maximum. Note explicitly that, in the 
given assumption of the efficiency being independent 
of m, the optimal choice of cuts does not depend on the 
assumed cross section for the new process am- This 
is a very useful feature, since this parameter is often 
unknown, and it is a direct consequence of the cho- 
sen approach, that focuses on maximizing the power 



where it is really necessary, that is at the threshold of 
visibility. Expression © becomes even simpler when 
the choice 6 = a is made: 



eit) 



a/2+ /B(i) 



(7) 



This simple expression is adequate in most prob- 
lems of search optimization; also, it is readily com- 
pared with some "significance-like" expressions that 
are commonly used for optimization purposes: 



b) 



Vb 
s 

Vb+s 



Note that expression b) cannot be maximized 
without knowing explicitly the cross section for the 
searched signal. Also, it does not quite represent what 
one wants to maximize for a search, being more di- 
rectly related to the relative uncertainty in the mea- 
surement of the yield of a new process, if found, than 
to significance. Expression a), being linear in S', shares 
with expression Q the good property of being inde- 
pendent of the cross section of the new process, but it 
has the important problem of breaking down at small 
values of B. Imposing maximization of a) may push 
the experiment efficiency down to very small values. 
In order to see the failure of expression a), it is suf- 
ficient to consider, for instance, that it will prefer an 
expectation of 0.1 signal events with a background of 
10~^ over a situation with 10 signal events expected 
and a background of 1 event. 

It should be apparent that expression ITJ (or its 
slightly more sophisticated form compared with 
"significances" a) and b), is not only better motivated, 
but also unambiguously preferable from a practical 
viewpoint. 

The features of the discussed formulas are more eas- 
ily seen by plotting the factor 1 / Smin from the exact 
calculation (that is proportional to the quantity that 
needs to be maximized, as in eq. ©) together with 
the two significance-like expressions discussed above: 
they all behave as 1 /VB at large B, and it is therefore 
possible to normalize them to converge as i? —^ oo. 
Expression b) is not simply proportional to S, so we 
had to make a choice and we put = a , in agree- 

ment with the spirit of our current approach of focus- 
ing on the point where significance is at the threshold, 
and solved for 1/S to obtain a function of B only. 

The comparison is shown in Fig. |31 where it ap- 
pears that our suggested solution lies between a) and 
b), where a) largely overestimates the "sensitivity" 
at low backgrounds, as expected, and conversely b) 
underestimates it, expecially for high significance set- 
tings. 

The Gaussian approximation to the exact solution 
is shown instead in fig.0I and its special case for 5 « a 
in fig. |S1 
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FIG. 3: Comparison of 1/Smin with the corresponding 
sensitivity factor given by S/VB (dotted) and S/VS + B 
(dashed), for a search experiment with (significance, CL) 
respectively of (95%,95%), (3cr,95%), (5cr,90%) 
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FIG. 4: Gaussian approximation of the "Sensitivity fac- 
tor" 1/Smin (eq. @) for a search experiment with (signifi- 
cance, CL) respectively of (95%,95%), (3a,95%), (5cr,90%) 

It can be seen that the approximate formulas work 
well at moderate values of a and 6, but become less 
accurate when high significance/CL are desired, due 
to the larger deviations from Gaussian behavior that 
occur in the Poisson far tails. However, the Gaussian 
approximation can easily be improved, without los- 
ing the good features of the solutions. For instance, 
it is possible to obtain a more accurate expression by 
accounting for differences between Gaussian and Pois- 
son tail integrals at the next order in a and b, simply 
by performing an empirical fit. This results in the 
following improved expression for Smm- 

S,mn = ^ + lT+"^+^ \/b^+4a^ + AB (8) 



Fig. |S1 shows this slightly modified expression to 
be considerably accurate even at high significances, 
which makes it suitable also for searches of "really 
new" effects, where a significance level of 5(T is a cus- 
tomary requirements. 

l/Smin 
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FIG. 5: Gaussian approximation of l/Smin in the b ~ a 
approximation (eq. 0), for a search experiment with 
(significance, CL) respectively of (95%, 95%), (3cr,95%), 
(5(7,90%). Curves are normalized to the asymptotic limit. 
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FIG. 6: Improved Gaussian approximation of the "Sensi- 
tivity factor" 1/Smin (eq. (jSJ for a search experiment with 
(significance, CL) respectively of (95%, 95%), (3cr,95%), 
(5cr,90%) 
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